A Scalable Clustering Method for Categorical Sequences
نویسندگان
چکیده
منابع مشابه
Scalable Hierarchical Clustering Method for Sequences of Categorical Values
Data clustering methods have many applications in the area of data mining. Traditional clustering algorithms deal with quantitative or categorical data points. However, there exist many important databases that store categorical data sequences, where significant knowledge is hidden behind sequential dependencies between the data. In this paper we introduce a problem of clustering categorical da...
متن کاملClustering From Categorical Data Sequences
The three-parameter cluster model is a combinatorial stochastic process that generates categorical response sequences by randomly perturbing a fixed clustering parameter. This clear relationship between the observed data and the underlying clustering is particularly attractive in cluster analysis, in which supervised learning is a common goal and missing data is a familiar issue. The model is w...
متن کاملClustering Sequences of Categorical Values
Conceptual clustering is a discovery process that groups a set of data in the way that the intra-cluster similarity is maximized and the inter-cluster similarity is minimized. Traditional clustering algorithms employ some measure of distance between data points in n-dimensional space. However, not all data types can be represented in a metric space, therefore no natural distance function is ava...
متن کاملA scalable algorithm for clustering protein sequences
The enormous growth of public sequence databases and continuing addition of fully sequenced genomes has created many challenges in developing novel and scalable computational techniques for searching, comparing, and analyzing these databases. Over the years, many methods have been developed for clustering proteins according to their sequence similarity. However, most of these methods tend to ha...
متن کاملLIMBO: Scalable Clustering of Categorical Data
Clustering is a problem of great practical importance in numerous applications. The problem of clustering becomes more challenging when the data is categorical, that is, when there is no inherent distance measure between data values. We introduce LIMBO, a scalable hierarchical categorical clustering algorithm that builds on the Information Bottleneck (IB) framework for quantifying the relevant ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Korean Institute of Intelligent Systems
سال: 2004
ISSN: 1976-9172
DOI: 10.5391/jkiis.2004.14.2.136